---
author: Marcus Rohrmoser
categories:
- en
- development
date: "2010-05-28T11:18:51+00:00"
tags:
- Apple
- Cocoa
- iPhone
- libxml2
- NSXMLParser
- RELAX NG
- RelaxNG
- SAX
- schema
- validate
- W3C
- XML
- xmllint
- xmlTextReader
title: 'iPhone: libxml2 & RELAX NG validation'
type: post
url: /2010/05/iphone-libxml2-relax-ng-validation/
yourls_shorturl:
- http://s.mro.name/1k
---
Having a [validating parser][1] in place can reduce the required code to parse [XML][2] a lot – you know very well what you actually get. As mentioned in my last post about [RELAX NG & trang][3], I prefer [RELAX NG][4] over [W3C XML Schema][5] – which doesn't matter anyway because [Apple's suggested XML parser][6] doesn't validate at all.

So we have to go one level deeper and have a look at [libxml2][7].

[Apple's example &#8222;XmlPerformance&#8220;][8] helped to get started, but didn't do the trick because libxml2 allows validation for [`xmlDocPtr`][9] or [`xmlTextReader`][10] but not for [SAX parsers][11] as used in the example.

The [libxml2 examples][12] didn't help me too much either, but luckily there's [xmllint available in source][13] (OSS just rocks) which does almost what we want. It first parses the XML into a `xmlDocPtr` and validates afterwards – and it does so for a reason:

You can have a validating `xmlTextReader` (via [`xmlTextReaderRelaxNGSetSchema`](http://xmlsoft.org/html/libxml-xmlreader.html#xmlTextReaderRelaxNGSetSchema)), but it won't detect [IDREF][14]s missing their referred to [ID][15] and the error messages lack the name of the failing item. BTW – when validating against a [W3C schema this ID/IDREF check isn't available yet][16].

I finally discarded streaming XML parsing in favour of validation and [&#8222;push&#8220; parsing][17] (nice for data coming in over the wire) and did:

1. [load the RELAX NG regular form schema (watch out for the assignment of `relaxngschemas`)][18] – similar to xmllint schema loading,
2. [push the raw XML data into a `xmlDocPtr` (`xmlCreatePushParserCtxt`)][13] exactly like xmllint,
3. [validate the in-memory document (`xmlRelaxNGValidateDoc`)][19],
4. [turn it into a `xmlTextReader`][20],
5. [process the reader][21].

Wrap up:

* if you want full RELAX NG validation with libxml2 v2.7.3, forget about streamed parsing,
* wrap the document into a `xmlTextReader` if you want a SAXish programming model.

I may prepare and publish a `MroLibxml2Parser` inheriting [`NSXMLParser`](http://developer.apple.com/iphone/library/documentation/Cocoa/Reference/Foundation/Classes/NSXMLParser_Class/Reference/Reference.html) and firing it's callbacks in order to easily switch validating and non-validating parser implementations, but this has to wait a bit. Stay tuned.

 [1]: http://www.w3.org/TR/REC-xml/#dt-valid
 [2]: http://en.wikipedia.org/wiki/XML
 [3]: http://blog.mro.name/2010/05/xml-toolbox-relax-ng-trang/
 [4]: http://www.oasis-open.org/committees/relax-ng/
 [5]: http://en.wikipedia.org/wiki/XML_Schema_(W3C)
 [6]: http://developer.apple.com/iphone/library/documentation/Cocoa/Reference/Foundation/Classes/NSXMLParser_Class/Reference/Reference.html
 [7]: http://www.xmlsoft.org/
 [8]: http://developer.apple.com/iphone/library/samplecode/XMLPerformance/Listings/Classes_LibXMLParser_m.html#//apple_ref/doc/uid/DTS40008094-Classes_LibXMLParser_m-DontLinkElementID_10
 [9]: http://www.xmlsoft.org/html/libxml-tree.html#xmlDocPtr
 [10]: http://www.xmlsoft.org/xmlreader.html
 [11]: http://www.xmlsoft.org/html/libxml-tree.html#xmlSAXHandler
 [12]: http://www.xmlsoft.org/examples/
 [13]: http://git.gnome.org/browse/libxml2/tree/xmllint.c#n2252
 [14]: http://www.w3.org/TR/xmlschema-2/#IDREF
 [15]: http://www.w3.org/TR/xmlschema-2/#ID
 [16]: https://bugzilla.gnome.org/show_bug.cgi?id=170795
 [17]: http://www.xmlsoft.org/examples/index.html#parse4.c
 [18]: http://git.gnome.org/browse/libxml2/tree/xmllint.c#n3513
 [19]: http://git.gnome.org/browse/libxml2/tree/xmllint.c#n2829
 [20]: http://xmlsoft.org/html/libxml-xmlreader.html#xmlReaderWalker
 [21]: http://www.xmlsoft.org/xmlreader.html#Walking
